浅谈微服务k8s情景下的日志采集 您所在的位置:网站首页 k8s 管理节点日志收集 浅谈微服务k8s情景下的日志采集

浅谈微服务k8s情景下的日志采集

2023-05-11 14:38| 来源: 网络整理| 查看: 265

  传统虚拟机、物理机环境下,日志文件通常存放于固定的路径下,当应用重启或出现异常退出的情况,日志也会留存下来,不受影响。而 Kubernetes 环境下,提供了相比前者更为细粒度的资源调度,容器(或 Pod)的生命周期是十分短暂的,当主进程退出,容器(或 Pod)便会被销毁,随之而来的是其关联资源也会被释放。因此,在日志采集的这个点上,Kubernetes 场景相比传统环境而言,会更为复杂,需要考虑的点更多。  普遍来说,Kubernetes 环境下的日志采集有如下几种模式:

DockerEngine 业务直写 DaemonSet Sidecar 采集日志类型 标准输出 业务日志 标准输出+部分文件 文件 部署运维难度 低 低 一般,维护daemonSet即可 高,每个需采集日志的Pod均需部署Sidecar容器 隔离程度 弱 弱 一般,只能通过配置间隔离 强,通过容器隔离,单独分配资源 适用场景 测试环境 对性能要求极高的业务 日志分类明确、功能较单一的集群 大型集群、PaaS型集群 DockerEngine 直写一般不推荐,也很少会用到; 业务直写推荐在日志量极大的场景中使用; DaemonSet 一般在节点不超过1000的中小型集群中使用; Sidecar 推荐在超大型的集群或是日志需求比较复杂的情况中使用。因为我司有日志处理分析等需求,所以同时应用了 DaemonSet 与 Sidecar 两种模式。业务直写方案也有少部分复杂场景会用到,因此,本文主要介绍前面两种日志采集模式。 DaemonSet 模式采集日志

  由于我们不会在 Kubernetes 下直接运行容器(Kubernetes 的最小资源调度管理单位为 Pod),Kubernetes 会将日志软链至 /var/log/pods/ 与 /var/log/containers 路径下,以帮助我们更好的管理日志。  我们登陆任意一个 k8s 节点:

1234567891011121314[root@ali-k8s-test-002 ~]# cd /var/log/pods/[root@ali-k8s-test-002 pods]# lscomp-tools_skywalking-oap-ff949d984-kqnkx_51faece4-3b59-429d-99bf-a8fea9726555comp-tools_skywalking-ui-95bd55c59-5x2qf_1422b8bd-988a-4739-bc96-53ccd9e164e6kubesphere-controls-system_kubectl-zhangminghao-6c654bc9c8-m46sj_5eae93cb-fbf4-46a6-ba5c-c728ccb73d1bkubesphere-devops-system_ks-jenkins-645b997d5f-tvlrs_a0d2ab73-d440-4d85-aa7c-612c3415341ekubesphere-logging-system_elasticsearch-logging-data-1_6baa822d-f877-4f5f-ba5f-3e0e98a7d617kubesphere-logging-system_elasticsearch-logging-discovery-0_0611bdb7-1989-4a06-a333-06ff045a4b1d[root@ali-k8s-test-002 pods]# cd /var/log/containers[root@ali-k8s-test-002 containers]# ll总用量 180lrwxrwxrwx 1 root root 136 4月 12 2021 ack-node-problem-detector-daemonset-vb4wm_kube-system_ack-node-problem-detector-3a8538726f9943d78e81395f29a4c39f3a831042b6264b34bc76068810272b78.log -> /var/log/pods/kube-system_ack-node-problem-detector-daemonset-vb4wm_3d088618-4af7-4def-913a-b85a74b06911/ack-node-problem-detector/0.loglrwxrwxrwx 1 root root 136 5月 11 2021 ack-node-problem-detector-daemonset-vb4wm_kube-system_ack-node-problem-detector-6b6d7ef089f0fc352afcf3f04aa2a189e92483580e4797fab9298ed7e7eae43f.log -> /var/log/pods/kube-system_ack-node-problem-detector-daemonset-vb4wm_3d088618-4af7-4def-913a-b85a74b06911/ack-node-problem-detector/1.log

可以大致的看出其命名结构为: /var/log/pods/__//,/var/log/containers/__。(扩展阅读: Where are Kubernetes’ pods logfiles? – StackOverflow )  因此我们只需要在每个节点上都部署采集器,通过 filebeat 等采集器对该路径下的日志进行采集即可。  那么,我们如何方便做到在 k8s 的每个节点上都部署一个采集器呢?这时候我们需要用到 k8s 中 daemonSet 这样的一种资源类型:

DaemonSet 确保全部(或者某些)节点上运行一个 Pod 的副本。 当有节点加入集群时, 也会为他们新增一个 Pod 。 当有节点从集群移除时,这些 Pod 也会被回收。删除 DaemonSet 将会删除它创建的所有 Pod。 daemonSet | Kubernetes

架构示意如下:  值得注意的是,这种模式下,需要统一应用的日志输出模式为标准输出错误输出,这样才会被日志引擎正确捕捉写入日志文件。同时,目前主流的云服务提供商的 serverless 虚拟 k8s 节点均不支持 daemonSet 模式,有此应用场景的需要使用其他方式来采集日志。

我们大概看一下这种模式的 filebeat 部署文件与配置文件:123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187---apiVersion: apps/v1kind: DaemonSetmetadata: namespace: kube-system name: filebeat labels: app: filebeatspec: selector: matchLabels: app: filebeat template: metadata: labels: app: filebeat spec: serviceAccountName: filebeat terminationGracePeriodSeconds: 30 hostNetwork: true dnsPolicy: ClusterFirstWithHostNet hostAliases: - ip: "192.168.201.126" hostnames: - "kafka01" containers: - name: filebeat image: docker.elastic.co/beats/filebeat:7.14.2 args: [ "-c", "/etc/filebeat.yml", "-e", ] env: - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName securityContext: runAsUser: 0 resources: limits: memory: 800Mi requests: cpu: 400m memory: 200Mi volumeMounts: - name: config mountPath: /etc/filebeat.yml readOnly: true subPath: filebeat.yml - name: data mountPath: /usr/share/filebeat/data - name: varlog mountPath: /var/log readOnly: true - name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true - name: dockersock mountPath: /var/run/docker.sock volumes: - name: config configMap: defaultMode: 0640 name: filebeat-config - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers - name: dockersock hostPath: path: /var/run/docker.sock - name: varlog hostPath: path: /var/log - name: data hostPath: # When filebeat runs as non-root user, this directory needs to be writable by group (g+w). path: /var/lib/filebeat-data type: DirectoryOrCreate---# 权限配置 ---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata: name: filebeatsubjects:- kind: ServiceAccount name: filebeat namespace: kube-systemroleRef: kind: ClusterRole name: filebeat apiGroup: rbac.authorization.k8s.io---apiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata: name: filebeat namespace: kube-systemsubjects: - kind: ServiceAccount name: filebeat namespace: kube-systemroleRef: kind: Role name: filebeat apiGroup: rbac.authorization.k8s.io---apiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata: name: filebeat-kubeadm-config namespace: kube-systemsubjects: - kind: ServiceAccount name: filebeat namespace: kube-systemroleRef: kind: Role name: filebeat-kubeadm-config apiGroup: rbac.authorization.k8s.io---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: name: filebeat labels: app: filebeatrules:- apiGroups: [""] # "" indicates the core API group resources: - namespaces - pods - nodes verbs: - get - watch - list- apiGroups: ["apps"] resources: - replicasets verbs: ["get", "list", "watch"]---apiVersion: rbac.authorization.k8s.io/v1kind: Rolemetadata: name: filebeat # should be the namespace where filebeat is running namespace: kube-system labels: app: filebeatrules: - apiGroups: - coordination.k8s.io resources: - leases verbs: ["get", "create", "update"]---apiVersion: rbac.authorization.k8s.io/v1kind: Rolemetadata: name: filebeat-kubeadm-config namespace: kube-system labels: app: filebeatrules: - apiGroups: [""] resources: - configmaps resourceNames: - kubeadm-config verbs: ["get"]---apiVersion: v1kind: ServiceAccountmetadata: name: filebeat namespace: kube-system labels: app: filebeat---

filebeat 配置:1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677---apiVersion: v1kind: ConfigMapmetadata: namespace: kube-system name: filebeat-config labels: app: filebeatdata: filebeat.yml: |- filebeat.autodiscover: providers: - type: kubernetes node: ${NODE_NAME} templates: - condition: equals: kubernetes.labels.filebeat_harvest: "true" config: - type: container encoding: utf-8 paths: - /var/log/containers/*${data.kubernetes.container.id}.log # exclude_lines: ["^\\s+[\\-`('.|_]"] # multiline: # max_lines: 10000 # pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}' # negate: true # match: after # symlinks: true # 过滤边车模式的filebeat日志 processors: - drop_event.when: contains: - kubernetes.container.name: "filebeat" processors: - drop_fields: fields: - "@metadata" - "beat" - "kubernetes.labels" - "kubernetes.container" - "kubernetes.annotations" - "host" - "prospector" - "input" - "offset" - "stream" - "source" - "agent.ephemeral_id" - "agent.hostname" - "agent.id" - "agent.name" - "agent.type" - "agent.version" - "host.name" - "input.type" - "ecs.version" - "input.type" - "log.offset" - "log.flags" - "log.file.path" - "version" output.elasticsearch: hosts: ["http://172.18.145.131:9200"] enabled: true worker: 1 compression_level: 3 indices: - index: "%{[kubernetes.labels.filebeat_index]}-%{+yyyy.MM.dd}" setup.ilm.enabled: false setup.template.name: "ms" setup.template.pattern: "ms-*" setup.template.settings: index.number_of_shards: 1 index.number_of_replicas: 0

查看 filebeat 实例在集群中的分布,可以看到每个 k8s 节点上都运行了一个 filebeat Pod:12345678910111213141516# kubectl get pod -A -o wide|grep filebeatkube-system filebeat-247nn 1/1 Running 0 24d 172.18.147.6 cn-shenzhen.172.18.147.6 kube-system filebeat-2bvnm 1/1 Running 0 24d 172.18.146.224 cn-shenzhen.172.18.146.224 kube-system filebeat-44b2q 1/1 Running 0 24d 172.18.146.200 cn-shenzhen.172.18.146.200 kube-system filebeat-dh247 1/1 Running 0 24d 172.18.147.14 cn-shenzhen.172.18.147.14 kube-system filebeat-qvrlk 1/1 Running 0 24d 172.18.146.201 cn-shenzhen.172.18.146.201 kube-system filebeat-r4xmw 1/1 Running 0 24d 172.18.146.223 cn-shenzhen.172.18.146.223 # kubectl get nodeNAME STATUS ROLES AGE VERSIONcn-shenzhen.172.18.146.200 Ready 229d v1.18.8-aliyun.1cn-shenzhen.172.18.146.201 Ready 229d v1.18.8-aliyun.1cn-shenzhen.172.18.146.223 Ready worker 238d v1.18.8-aliyun.1cn-shenzhen.172.18.146.224 Ready 238d v1.18.8-aliyun.1cn-shenzhen.172.18.147.14 Ready 174d v1.18.8-aliyun.1cn-shenzhen.172.18.147.6 Ready 209d v1.18.8-aliyun.1

sidecar 模式采集日志

  先来了解一下什么是 sidecar:

Sidecar 即边车,类似港台警匪片里警察的三轮摩托车旁边的跨斗,它们都属于这台三轮摩托车,跨斗即为边车,充当辅助作用。Pod 被作为 k8s 里管理的最小单元,一个 Pod 里可以包含一个或多个容器(container)。简单来说,如果把一个 Pod 类比成一台虚拟机,那么多个容器就是这个虚拟机里边的多个进程。

  既然我们将 Pod 类比为了传统的虚拟机,那么我们的日志也就可以应用传统虚拟机上采集日志的模式: 每台虚拟机运行一个 filebeat 实例,采集指定路径下的日志文件即可。转换过来就是: 每个业务Pod里除去应用容器,旁边还会跑一个filebeat等采集器的容器,应用容器通过共享存储的方式将日志文件挂载出来,filebeat 挂载这个共享存储来采集日志。  这样最大的好处就是,日志采集与处理的规则相比 daemonSet 要灵活,并且如果是从传统服务迁移到 K8S 里的应用,也可以延用先前的配置。架构示意如下:

我们也大概看一下这种模式的 filebeat 部署文件与配置文件:1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071apiVersion: apps/v1kind: Deploymentmetadata: labels: tier: backend name: chat-api namespace: defaultspec: replicas: 1 selector: matchLabels: tier: backend template: metadata: labels: tier: backend spec: hostAliases: - ip: "192.168.201.1" hostnames: - "kafka01" nodeSelector: common: prod containers: - name: chat-api imagePullPolicy: Always ports: - containerPort: 8080 protocol: TCP resources: limits: cpu: 1000m memory: 1024Mi requests: cpu: 1000m memory: 1024Mi volumeMounts: - name: service-logs-nas mountPath: /data/logs/chat - name: filebeat image: filebeat:6.4.2 args: [ "-c", "/etc/filebeat.yml", "-e", ] resources: requests: cpu: 10m memory: 30Mi limits: memory: 500Mi securityContext: runAsUser: 0 volumeMounts: - name: filebeat-config mountPath: /etc/filebeat.yml subPath: filebeat.yml - name: service-logs-nas mountPath: /data/logs/chat volumes: # 这里因为有归档需求,所以通过NFS来共享存储 # 如果没有归档需求,可以直接使用 emptyDir 来共享存储 - name: service-logs-nas nfs: path: /prod-api server: xxxx.cn-shenzhen.nas.aliyuncs.com - name: filebeat-config configMap: name: filebeat-sidecar-chat-config dnsPolicy: ClusterFirst restartPolicy: Always

filebeat 配置如下:1234567891011121314151617181920212223242526272829303132333435363738apiVersion: v1kind: ConfigMapmetadata: name: filebeat-sidecar-chat-config namespace: defaultdata: filebeat.yml: |- name: k8s-chat filebeat.shutdown_timeout: 3s filebeat.prospectors: - input_type: log paths: - /data/logs/chat/*${HOSTNAME}*.log json.keys_under_root: true json.add_error_key: true ignore_older: 12h close_removed: true clean_removed: true close_inactive: 2h fields: k8s_nodename: ${NODE_NAME} k8s_namespace: ${POD_OWN_NAMESPACE} type: k8s-chat format: json output.kafka: hosts: ["kafka01:20001] topic: '%{[fields.type]}' partition.round_robin: reachable_only: true username: "xxx" password: "xxx" required_acks: 1 compression: gzip max_message_bytes: 1000000 worker: 1

这里我们再看看我们的 filebeat 实例在 k8s 中的存在形式:1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586# kubectl describe pod chat-api-74fb6c5c4c-zfs78Name: chat-api-74fb6c5c4c-zfs78Namespace: defaultPriority: 0Node: ali-hn-k8s05012-chat-prod/172.18.205.12Start Time: Thu, 02 Dec 2021 23:04:31 +0800Labels: tier=backendStatus: RunningIP: 172.18.128.168IPs: IP: 172.18.128.168Controlled By: ReplicaSet/chat-api-74fb6c5c4cContainers: chat-socket: Container ID: docker://e8ede98cc8c06040797378bf0563fe39949d2563b13e712d171f64abcfaebd02 Image: xxxx Image ID: xxxx Port: 8002/TCP Host Port: 0/TCP State: Running Started: Thu, 02 Dec 2021 23:04:45 +0800 Ready: True Restart Count: 0 Limits: cpu: 1 memory: 2Gi Requests: cpu: 1 memory: 2Gi Liveness: tcp-socket :8002 delay=5s timeout=1s period=2s #success=1 #failure=5 Readiness: tcp-socket :8002 delay=5s timeout=1s period=2s #success=1 #failure=5 Environment: CACHE_IGNORE: js|html CACHE_PUBLIC_EXPIRATION: 3d ENV: test SERVICE_PROD: $PORT SERVICE_TIMEOUT: 120 POD_OWN_IP_ADDRESS: (v1:status.podIP) POD_OWN_NAME: chat-api-74fb6c5c4c-zfs78 (v1:metadata.name) POD_OWN_NAMESPACE: default (v1:metadata.namespace) SERVICE_BRANCH_NAME: master NODE_SERVER_TYPE: appForSocket Mounts: /data/logs/chat from service-logs-nas (rw) filebeat: Container ID: docker://b0df7f03590f09bc7eaf31fa931648c2b4bb652cff98abb8c8872cd11d8f7bea Image: filebeat:6.3.2 Image ID: docker-pullable://docker.elastic.co/beats/filebeat@sha256:af6eb732fece856e010a2c40a68d76052b64409a5d19b114686db269af01436f Port: Host Port: Args: -c /etc/filebeat.yml State: Running Started: Thu, 02 Dec 2021 23:04:45 +0800 Ready: True Restart Count: 0 Limits: memory: 500Mi Requests: cpu: 10m memory: 30Mi Environment: NODE_NAME: (v1:spec.nodeName) POD_OWN_NAMESPACE: default (v1:metadata.namespace) Mounts: /data/logs/chat from service-logs-nas (rw) /etc/filebeat.yml from filebeat-config (rw,path="filebeat.yml") /var/run/secrets/kubernetes.io/serviceaccount from default-token-95mrq (ro)Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled TrueVolumes: service-logs-nas: Type: NFS (an NFS mount that lasts the lifetime of a pod) Server: xxxx.cn-shenzhen.nas.aliyuncs.com Path: /prod-socket ReadOnly: false filebeat-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: filebeat-sidecar-chat-config Optional: falseEvents:

可以看出这个 Pod 中存在两个 container,分别为业务应用 container 与 filebeat 实例。

应用直写模式采集日志

  这种模式其实没啥特别需要讲的,一般是在业务应用的日志插件里直接推送日志,如 log4j 等日志引擎都能很方便的推送日志到 Kafka 等中间件或远程 Elasticsearch 中。  这种模式耦合度较高,适用于比较复杂或是对性能要求极高的场景下。

结语

  很少有技术银弹能够一次性解决复杂生产环境中的所有需求,因此,日志采集也往往存在多种方案并存的情况,需要我们按照实际的需求来选择。



【本文地址】

公司简介

联系我们

今日新闻

    推荐新闻

    专题文章
      CopyRight 2018-2019 实验室设备网 版权所有